Entities are storage containers used to hold data. They are the physical building blocks from which XML documents are constructed. Every XML document has at least one entity that serves as the base entity for the entire document. Except for this special entity, all entities must have a name as well as a piece of data associated with the entity.
Entities must be declared before they are used, and may be internal or external, parsed or unparsed, and general or parameter. Each of these types of entities will be discussed shortly. The various entity types can only be created in certain combinations, and must follow rules (based on the entity type) as to where they are declared (defined) and where they are used:
Type | Declared in | Used In | Always Parsed? |
---|---|---|---|
Internal General | Internal DTD | XML Doc | Yes |
External General | External DTD | XML Doc | No |
Internal Parameter | Internal DTD | Internal DTD | Yes |
External Parameter | External DTD | External DTD | Yes |
The XMLReader interface supplies the methods
for quering and tailoring the behavior of parsers.
Parsers are free to invent their own features and properties Some features and properties have standard names
..SAX features and properties..
xmlReader.setProperty(
"http://xml.org/sax/properties/lexical-handler",
new MyLexicalHandler()
);
System.out.println(
xmlReader.getFeature(
"http://xml.org/sax/features/external-general-entities")
+ "\n" +
xmlReader.getProperty(
"http://xml.org/sax/properties/lexical-handler")
);
Since features are identified by (absolute) URIs, anyone can define such features. Currently defined standard feature URIs have the prefix http://xml.org/sax/features/ before an identifier such as external-general-entities. Turn features on or off using setFeature.
http://xml.org/sax/features/external-general-entities | |
---|---|
True: | Include external general entities. |
False: | Do not include external general entities. |
Default | true |
Access: | (parsing) read-only; (not parsing) read-write; |
http://xml.org/sax/features/external-parameter-entities | |
True: | Include external parameter entities and the external DTD subset. |
False: | Do not include external parameter entities or the external DTD subset. |
Default | true |
Access: | (parsing) read-only; (not parsing) read-write; |
http://apache.org/xml/features/scanner/notify-char-refs | |
True: | Notifies the handler of character entity boundaries in the document via the start/endEntity callbacks. |
False: | Does not notify of character entity boundaries. |
Default | false |
http://apache.org/xml/features/scanner/notify-builtin-refs | |
True: | Notifies the handler of built-in entity boundaries (e.g &) in the document via the start/endEntity callbacks. |
False: | Does not notify of built-in entity boundaries. |
Default | false |
For a complete list of Features you may enter the Apache-Xerces2 site or the SAXProject site.
For parser interface characteristics that are described as objects, a separate namespace is defined. The objects in this namespace are again identified by URI, and the standard property URIs have the prefix http://xml.org/sax/properties/ before an identifier such as lexical-handler or dom-node. Manage those properties using setProperty().
http://xml.org/sax/properties/declaration-handler | |
---|---|
Desc: | Set the handler for DTD declarations. |
Type: | org.xml.sax.ext.DeclHandler |
Access: | read-write |
http://xml.org/sax/properties/lexical-handler | |
Desc: | Set the handler for lexical parsing events. |
Type: | org.xml.sax.ext.LexicalHandler |
Access: | read-write |
For a complete list of Properties you may enter the Apache-Xerces2 site or the SAXProject site.
The LexicalHandler interface introduces the methods
for handling events raised by entities of known meaning.
The method
of XMLReader may request the parser to skip unrecognized entity names by setting the standard feature flag http://xml.org/sax/features/external-general-entities to false.
The
method of the ContentHandler interface may be used for catching skipped entity events.
Parsers are not obliged to honor setFeature requests for ignoring entities. For handling CDATA events the LexicalHandler interface introduces the methods
Name | Description | Symbol |
---|---|---|
‘ | left single quote | ‘ |
’ | right single quote | ’ |
“ | left double quote | “ |
” | right double quote | ” |
† | dagger | † |
‡ | double dagger | ‡ |
‰ | per mill sign | ‰ |
♠ | black spade suit | ♠ |
♣ | black club suit | ♣ |
♥ | black heart suit | ♥ |
♦ | black diamond suit | ♦ |
← | leftward arrow | ← |
↑ | upward arrow | ↑ |
→ | rightward arrow | → |
↓ | downward arrow | ↓ |
™ | trademark sign | ™ |
Name | Code | Description | Symbol |
---|---|---|---|
	 | horizontal tab | ||
| line feed | ||
  | space | ||
! | exclamation mark | ! | |
" | " | double quotation mark | " |
# | number sign | # | |
$ | dollar sign | $ | |
% | percent sign | % | |
& | & | ampersand | & |
' | apostrophe | ' | |
( | left parenthesis | ( | |
) | right parenthesis | ) | |
* | asterisk | * | |
+ | plus sign | + | |
, | comma | , | |
- | hyphen | - | |
. | period | . | |
0 - 9 | digits 0-9 | 0 - 9 | |
: | colon | : | |
; | semicolon | ; | |
< | < | less-than sign | < |
= | equals sign | = | |
> | > | greater-than sign | > |
? | question mark | ? | |
@ | at sign | @ | |
A - Z | uppercase letters A-Z | A - Z | |
[ | left square bracket | [ | |
\ | backslash | \ | |
] | right square bracket | ] | |
^ | caret | ^ | |
_ | horizontal bar (underscore) | _ | |
` | grave accent | ` | |
a - z | lowercase letters a-z | a - z | |
{ | left curly brace | { | |
| | vertical bar | | | |
} | right curly brace | } | |
~ | tilde | ~ | |
 - • | unused | ||
– | – | en dash | – |
— | — | em dash | — |
˜ - Ÿ | unused | ||
|   | nonbreaking space | |
¡ | ¡ | inverted exclamation | ¡ |
¢ | ¢ | cent sign | ¢ |
£ | £ | pound sterling | £ |
¤ | ¤ | general currency sign | ¤ |
¥ | ¥ | yen sign | ¥ |
¦ | ¦ | broken vertical bar | ¦ |
§ | § | section sign | § |
¨ or ¨ | ¨ | umlaut | ¨ |
© | © | copyright | © |
ª | ª | feminine ordinal | ª |
« | « | left angle quote | « |
¬ | ¬ | not sign | ¬ |
­ | ­ | soft hyphen | |
® | ® | registered trademark | ® |
¯ | ¯ | macron accent | ¯ |
° | ° | degree sign | ° |
± | ± | plus or minus | ± |
² | ² | superscript two | ² |
³ | ³ | superscript three | ³ |
´ | ´ | acute accent | ´ |
µ | µ | micro sign | µ |
¶ | ¶ | paragraph sign | ¶ |
· | · | middle dot | · |
¸ | ¸ | cedilla | ¸ |
¹ | ¹ | superscript one | ¹ |
º | º | masculine ordinal | º |
» | » | right angle quote | » |
¼ | ¼ | one-fourth | ¼ |
½ | ½ | one-half | ½ |
¾ | ¾ | three-fourths | ¾ |
¿ | ¿ | inverted question mark | ¿ |
À | À | uppercase A, grave accent | À |
Á | Á | uppercase A, acute accent | Á |
 |  | uppercase A, circumflex accent |  |
à | à | uppercase A, tilde | à |
Ä | Ä | uppercase A, umlaut | Ä |
Å | Å | uppercase A, ring | Å |
Æ | Æ | uppercase AE | Æ |
Ç | Ç | uppercase C, cedilla | Ç |
È | È | uppercase E, grave accent | È |
É | É | uppercase E, acute accent | É |
Ê | Ê | uppercase E, circumflex accent | Ê |
Ë | Ë | uppercase E, umlaut | Ë |
Ì | Ì | uppercase I, grave accent | Ì |
Í | Í | uppercase I, acute accent | Í |
Î | Î | uppercase I, circumflex accent | Î |
Ï | Ï | uppercase I, umlaut | Ï |
Ð | Ð | uppercase Eth, Icelandic | Ð |
Ñ | Ñ | uppercase N, tilde | Ñ |
Ò | Ò | uppercase O, grave accent | Ò |
Ó | Ó | uppercase O, acute accent | Ó |
Ô | Ô | uppercase O, circumflex accent | Ô |
Õ | Õ | uppercase O, tilde | Õ |
Ö | Ö | uppercase O, umlaut | Ö |
× | × | multiplication sign | × |
Ø | Ø | uppercase O, slash | Ø |
Ù | Ù | uppercase U, grave accent | Ù |
Ú | Ú | uppercase U, acute accent | Ú |
Û | Û | uppercase U, circumflex accent | Û |
Ü | Ü | uppercase U, umlaut | Ü |
Ý | Ý | uppercase Y, acute accent | Ý |
Þ | Þ | uppercase THORN, Icelandic | Þ |
ß | ß | lowercase sharps, German | ß |
à | à | lowercase a, grave accent | à |
á | á | lowercase a, acute accent | á |
â | â | lowercase a, circumflex accent | â |
ã | ã | lowercase a, tilde | ã |
ä | ä | lowercase a, umlaut | ä |
å | å | lowercase a, ring | å |
æ | æ | lowercase ae | æ |
ç | ç | lowercase c, cedilla | ç |
è | è | lowercase e, grave accent | è |
é | é | lowercase e, acute accent | é |
ê | ê | lowercase e, circumflex accent | ê |
ë | ë | lowercase e, umlaut | ë |
ì | ì | lowercase i, grave accent | ì |
í | í | lowercase i, acute accent | í |
î | î | lowercase i, circumflex accent | î |
ï | ï | lowercase i, umlaut | ï |
ð | ð | lowercase eth, Icelandic | ð |
ñ | ñ | lowercase n, tilde | ñ |
ò | ò | lowercase o, grave accent | ò |
ó | ó | lowercase o, acute accent | ó |
ô | ô | lowercase o, circumflex accent | ô |
õ | õ | lowercase o, tilde | õ |
ö | ö | lowercase o, umlaut | ö |
÷ | ÷ | division sign | ÷ |
ø | ø | lowercase o, slash | ø |
ù | ù | lowercase u, grave accent | ù |
ú | ú | lowercase u, acute accent | ú |
û | û | lowercase u, circumflex accent | û |
ü | ü | lowercase u, umlaut | ü |
ý | ý | lowercase y, acute accent | ý |
þ | þ | lowercase thorn, Icelandic | þ |
ÿ | ÿ | lowercase y, umlaut | ÿ |